Transaction credit control for serial I/O systems

ABSTRACT

A method and implementing computer system are provided which allows for significantly improved input/output (I/O) subsystem designs in all systems which include serialized I/O transactions such as so-called Express specification systems. Transaction control methodology is implemented to improve Express design requirements for Express devices such as an Express switch, Express-PCI bridge, endpoint, and root complex. This is accomplished by utilizing improved transaction ordering and state machine and corresponding buffer design and improved flow control credit methodology which enables improved processing for controlling transactions flowing through Express devices including Express switches and Express-PCI bridges. An Express-PCI/PCIX transition bridge design is also provided, along with the flow control credit methodology and implementation within the Express-PCI/PCIX bridge design to enable efficient interfacing between Express and legacy or existing PCI/PCIX systems.

RELATED APPLICATIONS

[0001] Subject matter disclosed and not claimed herein is disclosed andclaimed in related co-pending application, Ser. No. ______ AttorneyDockets NK-2002-101 and Ser. No. ______ NK-2002-102, which are filed oneven date herewith.

FIELD OF THE INVENTION

[0002] The present invention relates generally to information processingsystems and more particular to a methodology and implemention for buffermanagement and transaction control for serialized input/outputtransactions.

BACKGROUND OF THE INVENTION

[0003] In computer systems today, the predominate input/output (I/O)subsystem in Notebooks, desktops, and servers is based on either theso-called PCI (peripheral component interconnect) or PCIX bus (see theRevision 2.3 PCI Local Bus Specification dated Mar. 29, 2002, and theRevision 1.0a PCIX Addendum to the PCI Local Bus Specification datedJul. 24, 2000). However, in order to keep pace with the growing need toprovide improved performance and scalability needs of the future, thePCI-SIG (Peripheral Component Interconnect Special Interest Group) isadopting a new PCI interconnect called “PCI Express”, herein afterreferred to as “Express”. Express is also referred to as “3GIO” in someversions of the Express specification. Express is a serialpoint-to-point switched fabric interconnect that utilizes the sameprogramming model as the current PCI and PCIX bus definitions. PCI andPCIX provide a set of transaction ordering rules that define therequirements as to whether a second transaction of various transactiontypes must be allowed or not allowed to bypass a first transaction ofvarious transaction types. These transaction ordering rules result insignificant complexity in PCI and PCIX devices, especially for PCIX-PCIX(and PCI-PCI) bridges. Express also introduces the concept of multi-portswitches. The Express specification defines an Express switch as alogical assembly of multiple virtual PCI-PCI bridge devices that haveone primary interface and multiple secondary interfaces, with eachexternal interface being an Express serial interface. An Express switchby definition is even more complex than today's typical PCIX-PCIX bridge(which are themselves very complex devices). Express carries over thetransaction ordering rules of PCI essentially unchanged, such that whenadding the serial nature and other features of Express, this results invery significant complexity for Express devices and introduces otherproblems.

[0004] Thus, there is a need for an improved method, circuit, and systemfor Express switches, Express-PCI bridges and other Express devices toimprove transaction ordering and buffer management requirements for dataconsistency, and also to avoid data transfer congestion and deadlocks.

SUMMARY OF THE INVENTION

[0005] A method and implementing computer system are provided whichallow for much improved input/output (I/O) subsystem designs for use inserialized I/O transaction systems including Express systems. To achieveimproved scalability, Express adds to PCI/PCIX a serial point-to-pointsignaling capability at the Express link and chip interface. Thisinvention defines means to greatly improve Express design requirements,making the design of Express devices such as an Express switch,Express-PCI bridge, endpoint, or root complex more efficient, lesscomplex and therefore less costly. This is accomplished by improving therequirements for input buffer designs and transaction credit types andcredit control for managing the flow of transactions across a serial I/Olink. An improved transaction credit and flow control is provided whichresults in significant performance improvements as transactions flowthrough Express devices and over Express links.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] A better understanding of the present invention can be obtainedwhen the following detailed description of a preferred embodiment isconsidered in conjunction with the following drawings, in which:

[0007]FIG. 1 is an illustration of a computer station, which is enabledfor connection to a computer network;

[0008]FIG. 2 illustrates several major components of the computer systemof FIG. 1;

[0009]FIG. 3 illustrates a high level logical block diagram of anExpress switch;

[0010]FIG. 4 illustrates an exemplary embodiment of improved transactionordering requirements for the PCI/PCIX domain in Express-PCI bridges;

[0011]FIG. 5 illustrates a description of the Express Traffic Classes;

[0012]FIG. 6 illustrates and exemplary embodiment for avoidingcongestion and potential system crashes due to overrun of deviceworkload capacity;

[0013]FIG. 7 illustrates an exemplary embodiment of Express buffersutilizing improved Express transaction ordering and buffer management;

[0014]FIG. 8 illustrates an exemplary Express switch embodimentutilizing improved Express transaction ordering and buffer management;

[0015]FIG. 9 illustrates an exemplary Express-PCI bridge implementationutilizing improved Express transaction ordering and buffer management inthe Express domain, and utilizing improved transaction ordering andbuffer management in the PCI/PCIX domain; and

[0016]FIG. 10 illustrates a portion of the Express switch logic for oneof the Express switch input ports as shown in FIG. 8 with an improvedmethod for defining and managing flow control credits.

DETAILED DESCRIPTION

[0017] The exemplary embodiment of the present invention is describedherein relative to the so-called “Express” Specification although it isunderstood that the invention is not limited to the Express system butrather applies to other systems which include serialized I/Otransactions. The current Express Base Specification includes adefinition for transaction ordering rules that are essentially the sameas the ordering rules for PCI and PCIX as defined in the Revision 2.3PCI Local Bus Specification dated Mar. 29, 2002, and the Revision 1.0aPCIX Addendum to the PCI Local Bus Specification dated Jul. 24, 2000. Toachieve improved scalability Express adds serial signaling capability atthe Express link and chip interface and other improvements. The additionof scalability improvements increases the design complexity of Expresschips, especially for the Express switch devices as well as for EndPoints and Root Complex. The current PCI and PCIX transaction orderingrequirements only add to this design complexity with complex transactionordering requirements, where some transactions must be allowed to bypasscertain other transactions, and while some transactions must not beallowed to bypass certain other transactions. This invention definesmeans to significantly improve the transaction ordering requirements,buffer management, flow credit definition and control, and otherimprovements for Express, making the design of Express devices much lesscomplex, less costly, with higher performance, and while avoidingcongestion and deadlock, especially for Express switch andExpress-bridge designs.

[0018] The various methods discussed herein may be implemented within acomputer network including a computer system which may comprise either aserver, workstation, or a PC for example. In general, an implementingcomputer system may include computers configured with a plurality ofprocessors in a multi-bus system in a network of similar systems.However, since the workstation, computer system, or server implementingthe present invention in an exemplary embodiment, is generally known inthe art and composed of electronic components and circuits which arealso generally known to those skilled in the art, circuit details beyondthose shown in FIG. 1, are not specified to any greater extent than thatconsidered necessary as illustrated, for the understanding andappreciation of the underlying concepts of the present invention and inorder not to obfuscate or distract from the teachings of the presentinvention.

[0019] In FIG. 1, an exemplary computer system 101 includes a processorunit 103 which is typically arranged for housing a processor circuitalong with other component devices and subsystems of the computer system101. The computer system 101 also includes a monitor unit 105, akeyboard 107 and a mouse or pointing device 109, which are allinterconnected with the computer system illustrated. Also shown is aconnector 111 which is arranged for connecting a modem within thecomputer system to a communication line such as a telephone line in thepresent example.

[0020] Several of the major components of the system 101 are illustratedin FIG. 2. Referring to FIG. 2, a processor circuit 201 is connected toa root complex 203 which denotes the root of an I/O hierarchy thatconnects the processor and memory subsystem to the I/O. It is noted thatthe processing methodology disclosed herein will apply to many differentinterconnect and/or network configurations. A cache memory device 205,and a system memory unit 207 are also connected to the root complex 203.Also connected to the root complex are several Express serialpoint-to-point connections 204A, 204B and 204C connecting to severalcorresponding Express endpoints 208A, 208B and 208C. An endpoint refersto a type of device that can be the requester or completer of an Expresstransaction whether on its own behalf or on behalf of a distinctnon-Express device (other than a PCI device or host processor). Alsoconnected to the root complex 203 over an Express link 233 in FIG. 2 isan Express switch 206. The switch 206 also includes a number ofadditional Express links 202A-202G, respectively. Connected to Expresslinks 202A, 202B, 202C, 202E and 202F are a number of additionalendpoints 212A, 212B, 212C, 212E and 212F, respectively. Endpoint 208Bis connected to a storage device 218 and endpoint 208C is connected to asound subsystem 224 in the FIG. 2 example. A modem 209 is arranged forconnection 210 to a communication line, such as a telephone line,through a connector 111 (FIG. 1). The modem 209, in the present example,selectively enables the computer system 101 to establish a communicationlink and initiate communication with a network server, such as throughthe Internet.

[0021] Endpoint 212B is connected through an input interface circuit 211to a keyboard 213 and a mouse or pointing device 215. Endpoint 212A iscoupled to a network through a network interface 217 in the example. Adiskette drive unit 219 is also shown as being coupled to an endpoint212E. A video subsystem 220, which may include a graphics subsystem, isconnected between endpoint 208A and a display device 221. A storagedevice 218, which may comprise a hard drive unit, is also coupled to anendpoint 208B. The diskette drive unit provides a means by whichindividual diskette programs may be loaded on to the hard drive, oraccessed directly, for selective execution by the computer system 101.As is well known, program diskettes containing application programsrepresented by magnetic media on the diskette, or optically readableindicia on a CD, may be read from the diskette or CD drive, and thecomputer system is selectively operable to read such media and createprogram signals. Such program signals are selectively effective to causethe computer system to present displays on the screen of a displaydevice and respond to user inputs in accordance with the functional flowof an application program. Again referring to FIG. 2, an Express-PCIbridge 225 is included which provides for attachment of legacy (i.e.earlier version devices) PCI and PCIX devices to PCI bus 227. TheExpress to PCI bridge 225 is coupled through link 202G to the switch206. A second Express-PCI bridge 229 is coupled through link 202D toswitch 206. Express-PCI bridge 229 also provides an additional PCI/PCIXbus 231.

[0022]FIG. 3 illustrates a high level block diagram of an Express switchor bridge 301. The Express specification defines an Express switch as alogical assembly of multiple virtual PCI-PCI bridge (or PCIX-PCIXbridge) devices. The upstream port of the Express switch or bridge 301provides an Express interface 303 to an Express link 304. The “upstream”port interface 303 is closer to the system processor. The Express switch301 also includes multiple downstream ports providing downstream Expressinterfaces 306, 308 and 310 to additional Express links 316, 318 and320, respectively. Connected to the upstream interface is a logicalPCIX-PCIX bridge 313 which in turn connects to an internal PCIX bus 311.Also connected to the PCIX bus 311 are multiple logical PCIX-PCIXbridges 336, 338 and 340 which in turn connect through their respectivedownstream Express interface circuits 306, 308 and 310 to Expressinterconnects 316, 318 and 320. The Express switch which containsmultiple virtual PCIX-PCIX bridges, is by definition much more complexthan a typical PCI-PCI bridge (or PCIX-PCIX bridge) that contains oneupstream port and only one downstream port.

[0023] With Express, transactions and data are moved within the Express“fabric” via packets. A packet is a fundamental unit of informationtransfer consisting of a header that, in some cases, is followed by adata payload. The Express architecture also includes a definition of atraffic class (TC) as defined by a 3-bit field such as “000” or “111”.FIG. 5 illustrates a definition of Express traffic classes. The trafficclass allows differentiation of transactions from different devices intoeight traffic classes. The lowest TC (000) is utilized for generalpurpose I/O and must be supported by all Express devices. The highest TC(111) is utilized for isochronous transactions that have real timepriority requirements. The other TC 3-bit combinations represent otherdifferentiated service classes (differentiated based onweighted-round-robin and/or other priority processing requirements).

[0024] Express also supports the concept of “virtual channels” (VC).Virtual channels provide a means to implement multiple logical dataflows for different devices over a given physical link. Each link mustsupport at lease one virtual channel, VC(0). The TC field is transmittedunmodified end-to-end through the fabric and is used by each componentto determine how to process the packet relative to other packets withinthe component. Together with the Express virtual channel support, the TCmechanism is a fundamental element of Express for enablingdifferentiated traffic servicing. As a packet traverses the fabric, thisinformation is used at every link and within each switch element to makedecisions relative to proper servicing of the traffic, such as therouting of the packets based on their TC labels through correspondingvirtual channels. It is up to the system software to determine TC labelsand the TC/VC mapping in order to provide differentiated services thatmeet target platform requirements. An example would be a system thatsupports isochronous transactions. In this case TC7 (111) would beutilized for isochronous transactions and TC7 must be mapped to the VCwith the highest weight/priority.

[0025] Traffic between Express devices over an Express link is managedvia a flow control mechanism. Express defines flow control as a methodfor communicating receiver buffer information from a receiver to atransmitter to prevent receiver buffer overflow. Flow control creditsare issued by a receiver to the transmitter, indicating whether and howmany transactions or how much data the transmitter can send to thereceiver. A transmitter cannot send a transaction or data to a receiverunless it has the appropriate flow control credits.

[0026] The key parts of VCs are the independent fabric resources(queues/buffers and associated control logic). These resources are usedto move information across Express links fully independently of flowcontrol between different VCs. This avoids flow control induced blockingwhere a single traffic class may create a bottleneck for all traffic inthe system. Traffic is associated with VCs by mapping packets withparticular TC labels to their corresponding VCs. The Express VCmechanism allows flexible mapping of TCs onto the VCs. In the simplestform, TCs can be mapped to VCs on a one-to-one basis. To allowperformance/cost tradeoffs, Express also allows mapping of TCs to asingle VC.

[0027]FIG. 4 illustrates improved transaction ordering requirements 401for the PCI domain of the Express-PCI bridge. In FIG. 4, the top rowrepresents the first transaction of a sequence. Each column in the toprow is designated with one of several types of transactions which couldoccur as the first transaction. For example, the first transaction couldbe a posted memory write (PMW) as shown in column 2 or a read request(RR) as shown in column 3, a write request (WR) as shown in column 4, aread completion (RC) as shown in column 5 or a write completion (WC) asshown in column 6. In the Rows A-E are illustrated the secondtransaction of the sequence which follows the transaction typedesignated in the top row. Contained at the intersection of a column(any one of the listed columns) and a row (any one of the listed rows)is illustrated the transaction ordering rule, that is, the entryindicates whether the second transaction in a sequence must be allowedor not allowed to bypass the first transaction in a sequence as thesecond transaction makes its way through the PCI/PCIX domain of theExpress-PCI bridge device in the direction the transaction is flowing.In FIG. 4, the transaction ordering rules all apply to transactionsflowing in the same direction. Transactions flowing in the upstreamdirection have no ordering requirements relative to transactions flowingin the downstream direction.

[0028]FIG. 4 also includes a definition of the table entries. A “Y” or“YES” designation in a block means that the second transaction (in thecorresponding row) must be allowed to pass the first transaction (in thecorresponding column) to avoid deadlock. A “N” or “NO” designation meansthat the second transaction (in the corresponding row) must not beallowed to pass the first transaction (in the corresponding column) topreserve strong write ordering. A “Y/N” indicates there are no orderingrequirements between the first and second transaction, that is thesecond transaction may optionally pass the first transaction or beblocked by it.

[0029] Again referring to FIG. 4, there are 5 transaction types shown inthe columns from left to right, and in the rows from top to bottomrespectively, a posted memory write (PMW), A read request (RR), a writerequest (WR), a read completion (RC), and also a write completion (WC).As used herein, references to write requests to I/O or configurationspace, are referred to as write requests, and write completions from I/Oor configuration space are referred to as write completions. Also inthis document reference to write requests to memory space are referredto as posted memory writes. As used herein, a “posted” memory write is atransaction that has completed on the originating bus before completingon the destination bus. The ordering rules defined in the Expressspecification result in very complex Express implementations, andcombined with other Express features can result in stalling andpotential deadlock and/or system crashes.

[0030] One problem with current PCI systems relative to possibledeadlocks is that Delayed Read Requests and Delayed Write Requests leaveresidual transactions (once the transaction has been attempted) at thehead of buffers which can cause deadlocks if proper bypassing rules arenot followed. Examples of residual transactions are Delayed Requests(Delayed Read and Delayed Write) which have been accepted across adevice interface. Once a Delayed Request is attempted across a bus froma first device to a second device, the request is now in the seconddevice, but the request also leaves the same Delayed Request at the headof the queue in the first device. The delayed request must continue tobe attempted from the first device to the second device until thecompletion transaction becomes available. Once the completiontransaction is available and the delayed request completes, the DelayedRequest in the first device is destroyed, being replaced by the DelayedCompletion transaction now in the first device and moving in theopposite direction. For PCI, by definition, delayed request transactionsresult in residual delayed requests at the head of the buffer queues.These residual requests require bypassing rules in order to allowcertain transactions to be able to bypass these residual transactions toavoid deadlocks. PCIX devices are required to be fully backwardcompatible with PCI, such that whenever a PCI device is installed onthat bus segment, the PCIX devices on that bus segment must operate inPCI mode.

[0031] The current definition of Express has carried over these sametransaction ordering requirements of PCI/PCIX with very littledifference and indicates that those transaction ordering requirementsare required throughout the Express fabric (including Express switchesand Express-PCI bridges). In accordance with the present disclosure,residual transactions are not needed and are not utilized in Expressdevices or in the Express domain of Express-PCI bridges. Therefore thecomplex ordering requirements (as described in the Expressspecification), complex buffer design, and complex transaction orderingstate machine required in PCI/PCIX bridges and devices, are not neededin Express devices such as an Express switch and the Express domain ofan Express-PCI bridge. Also, since Express utilizes a token credit basedflow control mechanism to help control the issuing of transactionsacross the Express link, transactions are not attempted across anExpress link unless the requester has received “credits” indicatingthere is space available on the other side of the link for thetransaction. Once the transaction is attempted it completes across thelink, resulting in no need for a residual copy of the transaction on therequester side of the link. As a result, there are no requirements forsecond transactions to bypass first transactions within the Expressdomain.

[0032] This improved methodology avoids any need for bypassing in theExpress domain and allows the Express transaction buffers to beimplemented as a single input buffer set and a single output buffer setwith much less complex ordering requirements in which transactions exitthe buffer sets in the same order as they entered the buffer sets. Inthe simplest case for an Express switch, only one input buffer set andonly one output buffer set are required at each port (if the optionalisochronous capability were not supported, and if only one VC weresupported). If isochronous capability and multiple VCs are supported,then the input and output buffer sets as described would be implementedfor each port and for each virtual channel. There are no orderingrequirements for transactions flowing through different virtualchannels. However, a fairness processing algorithm within internalarbiters must be utilized for resolving which transactions at the headof a given buffer set are given access to a given target port. Thecomplex buffer designs and complex ordering rules state machine whichare required for PCI are not required for Express devices which areimplemented in accordance with the present disclosure. Instead, priorcomplex implementations can be replaced with the improved embodiments ofthis invention.

[0033] These improvements are described in more detail in FIG. 7 andFIG. 8.

[0034] With the current Express specification, the potential exists forstalling of transactions in the Express fabric, resulting in potentialof system crashes due to the processor overrunning the work capacity ofI/O devices. If the device driver sends more commands to its device thanthe device can handle at a time, the device will get behind in theprocessing of the commands. This can result in transactions backing upinto the Express fabric resulting in a stalling condition and thepotential system crashes. A device driver needs to be aware of thenumber of tasks a device is able to queue up after which it would causeExpress to back up into the fabric.

[0035]FIG. 6 illustrates an exemplary embodiment for device drivers anddevices to track outstanding work tasks 601. There are a number of waysin which this could be accomplished. However, a preferred embodiment isto utilize a link list of tasks in system memory. Referring to FIG. 6,the link list would begin at some assigned offset, for example 24h 609.The “h” designation represents hexadecimal notation. Each task 607 wouldbe posted by the device driver in the link list in system memory, andalso a pointer to the first or next task 605 would be included in thelist, which would point to the next entry in the list. For example, thefirst pointer at the offset 609 is to an address of 3Ch. Additionalexemplary pointers shown in hexadecimal are also included in FIG. 6. Thedevice would pick up new work tasks to be handled by checking the tasklist in memory. Once the device completes a task or set of tasks, thedevice posts an interrupt to the process indicating which task or taskshave been completed. The device driver then updates the link list insystem memory, posting any new tasks that need to be executed.

[0036] With this tracking mechanism, the device driver needs to be awareof how many tasks the device can accept at a time, and can assure thatit never assigns the device more work tasks than its capacity to handle.Whenever the device driver updates the task list in memory, it thenaccesses the device indicating that an updated task list is available insystem memory. In this manner, the device should always be able toaccept the read or write accesses to the device from the processor(device driver), thus avoiding backing up processor access attempts tothe device into the Express fabric. This objective may also beaccomplished by the device driver sending the tasks directly to thedevice followed by the device indicating completion of a task or taskswith an interrupt. However, this would require more accesses to thedevice than the preferred embodiment described above, resulting in lowerperformance.

[0037]FIG. 7 is a schematic diagram illustrating a preferred embodimentfor a portion of an Express switch 701 which is a significantly improvedapproach according to this invention utilizing only one input buffer set704 and only one output buffer set 710. The exemplary embodimentillustrated in FIG. 7 is for one port and one virtual channel.Transactions entering the input buffer set 704 of a given port come fromthe I/O interconnect and deserializer at that port. Transactionsentering the output buffer set 710 of a given port can come from anyother port of that switch. This improved approach can be accomplishedbecause the improved transaction ordering requirements for Expressresults in the need for only one input buffer set 704 and only oneoutput buffer set 710. The buffer sets are organized such thattransactions of all types flow through them in such a manner thattransactions exit the buffers in the same order or sequence as theyentered. Also illustrated in FIG. 7 is a set of all five of thetransaction types 708, i.e. PMW (Posted Memory Write), RR (ReadRequest), WR (Write Request), RC (Read Completion), and WC (WriteCompletion), which can flow through the single input buffer sets.Similarly, all transaction types can flow through the output buffer sets710. Transaction flows through the buffers are managed through FlowControl Credits and Transaction Ordering Control circuitry 720.

[0038]FIG. 8 illustrates a detailed drawing of a preferred embodimentfor an Express switch device such as switch 206 in FIG. 2. In FIG. 8, amulti-port Express switch 801 has a single upstream (toward theprocessor) Express serial port 805 producing a serial I/O interconnect803, and multiple downstream Express serial ports 859 and 861 producingserial I/O interconnects 863 and 865, respectively. Express allows up toeight (8) virtual channels, which are illustrated in phantom in FIG. 8as VC0-VC7 811. Each virtual channel includes a multiplexer at theupstream port 813 and a multiplexer at each downstream port 851 and 853,to allow access to each switch port. A serializer-deserializer (SERDES)810, 855, and 857 is required for each port, since Express is a serialinterface. Also included in FIG. 8 are single input buffer sets 831,839, and 841 (as illustrated in FIG. 7) per port per virtual channelhaving improved management requirements and ordering rules operable suchthat transactions exit the buffers in the same order as they entered thebuffers according to this invention. This can be accomplished for theinput buffer set with pointers that can be updated following eachoperation such that transactions of all types exit the buffer set in thesame order or sequence in which they entered. Only a single input bufferset 831 is provided for handling traffic, per virtual channel, in thedownstream path from the upstream port. Separate single input buffersets 839 and 841 are provided at each of the downstream ports, pervirtual channel, for handling traffic in the upstream direction.

[0039] Also included in FIG. 8 are a set of output “ping-pong” buffers819, 843, and 845 at each port for each virtual channel operable suchthat transactions exit the buffers in the same order as they entered.One of the output buffers holds the next transaction to feed the SERDESat the port output, during which time the following transaction to beserialized can be transferred into the other buffer. These sets of inputand output buffer sets are provided for each port and for each virtualchannel. Included also in FIG. 8 is a non-blocking cross-bar or Xbarswitch circuit 833 to allow for steering transactions flowing out of theinput buffers toward their target buffers at the appropriate targetoutput ports. This non-blocking switch 833 allows transfers to occurbetween any two combination of ports of the switch while simultaneouslyallowing transfers to occur between any other combinations of sets oftwo ports of the switch. Also shown in FIG. 8 are I/O Flow ControlCredit and buffer ordering state machines 816, 847, and 849 that areutilized to control the input and output buffer sets and management ofthe flow control credit information that is shared at each port at eachend of the links.

[0040] Express defines the capability of up to eight (8) virtualchannels (VC0-VC7) 811, where the highest priority VCn can be utilizedfor isochronous transactions when supported. VC0 is defined for thelowest priority general purpose transactions and VC1-VC7 allowing forother weighted priority traffic. In Express, transactions in differentVCs or of different TCs have no ordering requirements relative to eachother. An Express device must implement at least one VC (VC0). In orderfor multiple VCs to gain access to the device's various Expressinterfaces, VC multiplexers 813 and 851 and 853 must be provided at theupstream port 805 and at each of the downstream ports 859 and 861,respectively.

[0041] Also included in FIG. 8 is all of the various Express SwitchControl logic 821, consisting essentially of the Arbiter Control,Isochronous Control, I/O Flow Credit Control, Steering and Other ControlLogic. Internal arbitration is required between the output of the inputbuffer sets 831, 839, and 841 on each virtual channel and the input ofthe output buffer sets 819, 843, and 845 on each virtual channel. Thearbiter selects the packets that flow through the X-Bar switch to theoutput buffer sets. Once transactions have made it to the output buffersat the output port of each virtual channel, internal arbitration againis required to arbitrate between virtual channels for access to the portSERDES to multiplex and serialize the winning transaction out over theappropriate Express serial I/O link.

[0042] An Express switch device is a very complex device. Asillustrated, improving the transaction buffering and orderingrequirements for Express, as herein disclosed, allows a much improved,less complex, less costly, and higher performance buffer design andcontrol as shown in FIG. 7 and FIG. 8. Managing only one input bufferset and only one output buffer set (per port per virtual channel) withimproved ordering requirements is much less complex than managingmultiple input and output buffer sets (per port per virtual channel)which have the much more complex ordering and control requirements ascurrently defined in the Express specification. The concepts,descriptions and examples for the invention have been described withapplication for PCI Express. However, the concepts of this invention areapplicable to any serial I/O interconnect such as Rapid I/O andHyper-Transport implementations.

[0043]FIG. 9 illustrates a significantly improved embodiment of anExpress-PCI bridge 901, which provides a serial I/O interconnect 903 andserial interface 905, and a SERDES 907 for serializing and deserializingthe Express interface. The Express-PCI bridge is divided into twoseparate domains. The upper portion of the bridge 901 is the Expressdomain 917, and the lower portion of the bridge is the PCI/PCIX domain919. Any PCI/PCIX transaction bypassing requirements are handled withinthe PCI/PCIX domain of the Express-PCI bridge. The bridge of FIG. 9illustrates only one secondary port, but also could include multiplesecondary PCI/PCIX ports. Also included in the Express domain of thebridge is a single set of outbound buffers 911 (outbound with respect tothe Root Complex) and an inbound set of buffers 913. The outbound bufferset 911 and the inbound buffer set 913 in the Express domain of FIG. 9correspond to the input buffers 704 and output buffers 710,respectively, as illustrated in FIG. 7 for Express devices. Alsoillustrated in FIG. 9 is a set of I/O Flow control credits andtransaction ordering control 909 to manage the credits that are sharedacross the Express link and to manage the single outbound 911 andinbound 913 buffer sets. All outbound traffic entering the Express-PCIbridge in the outbound direction is re-mapped to TC(0), since it istargeting the PCI/PCIX domain. Since the PCI domain is within one plane,it does not comprehend virtual channels. Since this Express interface inthe outbound direction is targeting the PCI/PCIX domain of the bridge,only the Default virtual channel VC0 is included.

[0044]FIG. 9 also illustrates the PCI/PCIX bus interface 931 producingthe PCI/PCIX bus 933. Also included are three sets of outboundtransaction buffers 923 in the PCI/PCIX domain 919. One set is providedfor posted memory write transactions (PMW), another set is provided forread and write requests (RR and WR), and the third set of buffers areprovided for read and write completions (RC and WC). Each of the threebuffer sets provide transaction ordering as defined in the table 401 inFIG. 4 and as improved by this invention. Also the transaction orderingbetween the three buffer sets are as defined in the table 401 in FIG. 4and as improved by this invention. Also illustrated in FIG. 9 are threeadditional sets of buffers for the inbound direction 921. As with theoutbound direction, the inbound buffers include one set that is providedfor posted memory write transactions, another set is provided for readand write requests, and the third set of buffers are for read and writecompletions. Again each of the three buffer sets provide transactionordering as defined in the table 401 in FIG. 4 and as improved by thisinvention. The actual number of buffers utilized in each of the buffersets 921 and 923 is implementation dependent, as long as the transactionordering rules are met.

[0045] Also included in FIG. 9 is a set of transaction ordering statemachine control logic 925 for the outbound direction which implementsthe transaction ordering requirements defined in the table of FIG. 4 andas improved by this invention, and controls the transactions flowingthrough the buffer sets flowing in the outbound direction (buffers 923).A set of transaction ordering state machine control logic 926 for theinbound direction implements the transaction ordering requirementsdefined in the table of FIG. 4, and controls the transactions flowingthrough the buffer sets in the inbound direction (buffers 921). In thismanner the transaction ordering requirements of legacy PCI/PCIX adaptersinstalled behind the Express-PCI bridge (on the PCI side) are handled bythe buffer and buffer management and transaction ordering state machinecontrol logic within the PCI/PCIX domain 919 of the Express-PCI bridge901. In the PCI domain, the residual delayed transactions are left atthe head of the requesting buffers requiring some transactions not tobypass or to bypass to meet the producer-consumer and deadlock avoidancerequirements, respectively, as defined in the ordering requirements inthe table of FIG. 4.

[0046] As mentioned earlier, the problem with PCI relative to possibledeadlocks is that Delayed Read Requests and Delayed Write Requests leaveresidual transactions (once the transaction has been attempted) at thehead of buffers which can cause deadlocks if proper bypassing rules arenot followed. Examples of residual transactions are Delayed Requests(Delayed Read and Delayed Write) which have been accepted across adevice interface. Once a Delayed Request is attempted across a bus froma first device to a second device, the request is now in the seconddevice, but also leaves the same Delayed Request at the head of thequeue in the first device. The delayed request must continue to beattempted from the first device to the second device until thecompletion transaction becomes available. Once the completiontransaction is available and the delayed request completes across thebus, the Delayed Request in the first device is destroyed, beingreplaced by the Delayed Completion transaction now in the first devicemoving in the opposite direction. Therefore for PCI, delayed requesttransactions result in residual delayed requests at the head of thebuffer queues in the requesting device. These residual requests requirebypassing rules in order to allow certain transactions to be able tobypass these residual transactions to avoid deadlocks. Transactions thatrequire such bypass to avoid deadlocks are the intersection of Row A andColumns 3-6 and the intersection of Rows D-E and Columns 3-4 in thetable of FIG. 4.

[0047] The entries in table 401 of FIG. 4 are utilized in the PCI/PCIXdomain of FIG. 9. The improved table entries are for the requesttransactions (RR and WR) for the four entries at the intersection ofRows B and C and Columns 3 and 4. The PCI/PCIX specifications and thecurrent Express specification specify that these entries are “Y/N”, thatis there are no ordering requirements between the first and secondtransaction of a sequence. As herein disclosed, these entries need to be“No” (the second transaction must not be allowed to pass the firsttransaction) in the inbound (toward the system processor) direction.These entries can remain the same as currently specified in PCI/PCIX forthe outbound direction (away from the system processor). If theseentries for the inbound direction were to remain as “Y/N”, this wouldmean that PCI devices could introduce multiple delayed requests into theExpress domain of the Express-PCI bridge and into the Express fabric.This could be a problem when peer PCI devices are installed behindPCIPCI bridges designed to PCI Specification Revision 2.0 and these“2.0” bridges are behind (downstream from) the Express-PCI bridges. The2.0 bridges will cause stalling of delayed requests headed in theoutbound direction in the Express-PCI bridges when there are memorywrites headed in the inbound direction in the 2.0 bridge resulting incongestion in the Express fabric and potentially system crashes. Bychanging these table entries to “No” in the inbound direction, thisrestricts the PCI bus under an Express-PCI bridge to introducing onlyone delayed request at a time into the Express domain thus avoiding thisproblem. The other table entries in FIG. 4 are the same as that definedin the PCIX specification for PCI/PCIX devices. Thus, the PCI/PCIXdomain of the Express-PCI bridges meet the existing legacy PCI/PCIXtransaction ordering rules assuring the avoidance of deadlocks(including the case of peer-peer PCI/PCIX devices installed behindExpress-PCI bridges).

[0048]FIG. 10 illustrates a portion of the Express switch logic for oneof the Express switch input ports as shown in FIG. 8. Included in FIG.10 is the VC Multiplexer 1015, SERDES 1010, and the Serial Interface1007, producing the Express I/O Interconnect 1009. Also included in FIG.10 is an improved input buffer set 1017 which utilizes the improvedordering rules for Express devices as defined in this invention. I/OFlow Control Credits and Transaction Ordering Control 1016 and theoutput buffer set 1019 are also illustrated. As shown, the varioustransaction types including PMW, RR, WR,RC, and WC, are all able to flowthrough the input buffer sets. All five transaction types can also flowthrough the output buffer sets. In this example six input buffers areincluded, with two reserved for PMW, two buffers reserved for requests(RR and WR), and two buffers reserved for completions (RC and WC). A setof flow control credits would be allocated for PMW, a set of flowcontrol credits allocated for requests (RR and WR), and a set of flowcontrol credits allocated for completions (RC and WC). The I/O FlowControl Credits and transaction ordering control 1016 in FIG. 10 areutilized to control the input and output buffer sets and management ofthe flow control credit information that is shared at each port at eachend of the links. There are six flow control credit types, two forposted memory write (one for headers and one for data), two for requests(one for header and one for data), and two for completions (one forheader and one for data). Each virtual channel has its own independentset of flow control credit mechanism for controlling the flow oftransactions between links.

[0049] In FIG. 10, if a significant number of PMW transfers areattempted requiring a significant number of PMW credits before anothertype of transaction (requests or completions) enters the port and intothe input buffers, then two-thirds of the buffers (since four buffers inthis example would be reserved for requests and completions) andtwo-thirds of the flow control credits (since credits would be reservedfor use for the request and completion transactions) sit idle as the PMWtransfers flow through the input buffers. Flow control credits areperiodically updated between the Express ports at each end of the linksas defined in the Express specification. Even though the input bufferset includes the improved approach as illustrated in FIG. 10, there isstill the need to allocate the credits for each transaction type (postedmemory writes, requests, and completions), as the three transactiontypes flow through the single input buffer set in FIG. 10.

[0050] The improved single input and output buffer set as shown in FIG.10, illustrates a new improved approach for defining, assigning, andmanaging flow control credits, which improves the flow control logic,provides significant improvement in Express performance and reduceslatency. This can be accomplished by taking advantage of the fact thatthe input and output buffer sets in FIG. 10 are strictly ordered withtransactions exiting the buffers in the same order as they entered thebuffers. The six credit types (PMW header and PMW data, Request headerand Request data, and Completion header and Completion data) can beredefined and managed as only two credit types (Transaction header andTransaction data). Thus, in FIG. 10 the PMW credits, Request credits,and Completion credits would all become Transaction credits.

[0051] Without this improvement, if a large number PMW data transfersare attempted in sequence at a given input port before any other type oftransactions (requests of completions) were to flow through the bufferset, then before all of the PMW transactions can be moved across theswitch, the arbiter would go around the arbitration loop (using somefairness algorithm) to all of the other input port buffers and all ofthe other virtual channels, giving each a chance for access to theirselected output port for moving their transactions across the switch.Depending on the number of PMW transactions being attempted, this couldresult in a number of arbitration loops (to all ports and virtualchannels) before all of the PMW transactions can be moved across theswitch through the given port. However, in accordance with the presentdisclosure, all six of the buffers could be utilized for whichevertransactions come through the six buffers regardless of the transactiontype in whatever combination they flow through the buffers (in strictorder). This avoids the situation where two thirds of the input buffersand two thirds of the flow control credits sit idle during blocktransfers.

[0052] With this disclosure, there is no need to allocate credits basedon transaction types (since the six transaction types (including bothheader and data) for PMW, requests, and completions will now beconsidered as just two generic flow credit types for transaction headerand transaction data. In this manner all six of the available buffersand all of the available flow control credits can be utilized, resultingin data being able to be moved across Express switches (and otherExpress devices) with lower latency and improved performance. Also, fora given chip size and buffer space allocation, it can be a designtradeoff as to the number of buffers and size of buffers to be allocatedfor each set of input buffers per port, to allow possible room in a chipfor more virtual channels if desired. Regardless of the design tradeoffsselected, moving a significantly larger number of sequential PMWtransaction across the switch while requiring fewer cycles around thearbitration loop, results in lower latency and improved performance.

[0053] For backward compatibility the switch will handle both of theflow control mechanisms, the previous approach of six credit types forPMW, requests and completions, plus the improved approach with twotransaction credit types (transaction header and transaction data).During link configuration or link training, initially a link willutilize the existing flow control credit method. During link training,if both ends of the link can support the improved approach, then bothends of the link will switch to the new approach in accordance with thepresent disclosure. As transactions move across a switch through theinput side of a first port and through the output side of a second port,the input side of the first port could be utilizing the improved flowcontrol method as herein disclosed, while the output side of the secondport could be utilizing the existing flow control method according tothe Express specification.

[0054] Also, as transactions move across the switch through the inputside of a first port and through the output side of a third port, theinput side of the first port could be utilizing the improved flowcontrol method and the output side of the third port could also beutilizing the improved flow control method. Both the input and outputside of the same port of a switch must utilize the same flow controlmechanism as the other end of the serial link it is connected to.

[0055] The method and apparatus of the present invention has beendescribed in connection with a preferred embodiment as disclosed herein.Although an embodiment of the present invention has been shown anddescribed in detail herein, along with certain variants thereof, manyother varied embodiments that incorporate the teachings of the inventionmay be easily constructed by those skilled in the art, and even includedor integrated into a processor or CPU or other larger system integratedcircuit or chip. The disclosed methodology may also be implementedsolely or partially in program code stored on a CD, disk or diskette(portable or fixed), or other memory device, from which it may be loadedinto system memory and executed to achieve the beneficial results asdescribed herein. Accordingly, the present invention is not intended tobe limited to the specific form or example set forth herein, but on thecontrary, it is intended to cover such alternatives, modifications, andequivalents, as can be reasonably included within the spirit and scopeof the invention.

1. A method for processing information transactions from a first deviceto a second device, said method comprising: determining an amount oftransaction flow control credits required to store first transactioninformation originating at said first device; sending a periodic creditmessage from said second device to said first device, said creditmessage containing transaction flow control credits for an amount ofcurrently available storage space in said second device; andtransferring said first transaction information from said first deviceto said second device only if said currently available transaction flowcontrol credits are sufficient to store said first transactioninformation.
 2. The method as set forth in claim 1 wherein said methodis repeated periodically if said currently available transaction flowcontrol credits are insufficient to store said first transactioninformation.
 3. The method as set forth in claim 1 and furtherincluding: storing said first transaction information in storage in saidfirst device; and removing said first transaction information from saidfirst device when said first transaction information is transmitted tosaid second device.
 4. The method as set forth in claim 3 and furtherincluding storing said first transaction information in storage in saidsecond device when said first transaction information is transmitted tosaid second device.
 5. The method as set forth in claim 1 and furtherincluding: determining which of a plurality of flow control creditmethods are supported by said first and said second devices, said flowcontrol credit methods including a first flow control credit method anda second flow control credit method; selecting said first flow controlcredit method if said first and said second devices support said firstflow control credit method, said first flow control credit method beingoperable for utilizing two flow control credit types including atransaction header credit type and a transaction data credit type; andselecting said second flow control credit method if one or both of saidfirst and second devices supports only said second flow control creditmethod, said second flow control credit method being operable forutilizing more than two flow control credit types.
 6. The method as setforth in claim 5 wherein available buffer space is allocated and managedfor said first flow control method based on two flow control credittypes including transaction header credits and transaction data credits.7. The method as set forth in claim 5 wherein said flow control credittypes for said second flow control credit method include credit typesfor posted memory write header credits, posted memory write datacredits, request header credits, request data credits, completion headercredits and completion data credits.
 8. The method as set forth in claim6 wherein available buffer space is allocated and managed for saidsecond flow control method based on more than two flow control credittypes including posted memory write header credits, posted memory writedata credits, request header credits, request data credits, completionheader credits and completion data credits.
 9. An information processingsystem for transferring information transaction requests from a firstdevice to a second device, said system comprising: means for determiningan amount of transaction flow control credits required to store firsttransaction request information originating at said first device; meansfor sending a periodic credit message from said second device to saidfirst device, said credit message containing transaction flow controlcredits for an amount of currently available storage space in saidsecond device; and means for transferring said first transactioninformation from said first device to said second device only if saidcurrently available transaction flow control credits are sufficient tostore said first transaction information.
 10. The information processingsystem as set forth in claim 9 wherein said processing is repeatedperiodically if said currently available transaction flow controlcredits are insufficient to store said first transaction information.11. The information processing system as set forth in claim 10 andfurther including: storage means for storing said first transactioninformation in said first device; and means for removing said firsttransaction information from said first device when said firsttransaction information is transmitted to said second device.
 12. Theinformation processing system as set forth in claim 11 and furtherincluding means for storing said first transaction information in saidsecond device when said first transaction information is transmitted tosaid second device.
 13. The information processing system as set forthin claim 9 and further including: means for determining which of aplurality of flow control credit methods are supported by said first andsaid second devices, said flow control credit methods including a firstflow control credit method and a second flow control credit method; andmeans for selecting said first flow control credit method if said firstand said second devices support said first flow control credit method,said first flow control credit method being operable for utilizing twoflow control credit types including a transaction header credit type anda transaction data credit type.
 14. The information processing system asset forth in claim 11 and further including means for allocating andmanaging available buffer space based on a first flow control methodusing only two flow control credit types.
 15. The information processingsystem as set forth in claim 13 and further including means forselecting said second flow control credit method if one or both of saidfirst and second devices supports only said second flow control creditmethod, said second flow control credit method being operable forutilizing more than two flow control credit types.
 16. The informationprocessing system as set forth in claim 15 wherein said flow controlcredit types for said second flow control credit method include credittypes for posted memory write header credit, posted memory write datacredit, request header credit, request data credit, completion headercredit and completion data credit.
 17. The information processing systemas set forth in claim 16 and further including means for allocating andmanaging available buffer space based on a second flow control methodusing more than two flow control credit types.